{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Solving Double Integrator with SVP\n",
    "\n",
    "This notebook shows how to use the proposed structured value-based planning (SVP) approach to generate the state-action $Q$-value function for the classic double integrator problem. The correctness of the solution is verified by trajectory simulations."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Problem Definition\n",
    "\n",
    "The problem is defined as a unit mass brick moving along the $x$-axis on a frictionless surface, with a control input which provides a horizontal force.\n",
    "The brick starts from the position-velocity pair $\\left(x, \\dot{x}\\right)$ and follows the dynamics\n",
    "\n",
    "\\begin{align*}\n",
    "    x &:= x + \\dot{x}\\cdot\\tau, \\\\\n",
    "    \\dot{x} &:= \\dot{x} + u\\cdot\\tau,\n",
    "\\end{align*}\n",
    "\n",
    "where $u \\in \\left[ -1, 1 \\right]$ is the acceleration input, $\\tau$ is the time interval between decisions. The brick can take on the state values $\\left(x, \\dot{x}\\right) \\in \\left[ -3., 3. \\right] \\times \\left[ -3., 3. \\right]$. To incentivize getting to the original point of the $x$-axis with minimum control input, the reward function is defined in a quadratic form as \n",
    "\n",
    "\\begin{equation*}\n",
    "    r(x,\\dot{x}) = - \\frac{1}{2} \\left( x^2 + \\dot{x}^2 \\right).\n",
    "\\end{equation*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Structured Value-based Planning (SVP)\n",
    "\n",
    "The proposed structured value-based planning (SVP) approach is based on the $Q$-value iteration. At the $t$-th iteration, instead of a full pass over all state-action pairs:\n",
    "- SVP first randomly selects a subset $\\Omega$ of the state-action pairs. In particular, each state-action pair in $\\mathcal{S}\\times\\mathcal{A}$ is observed (i.e., included in $\\Omega$) independently with probability $p$. \n",
    "- For each selected $(s,a)$, the intermediate $\\hat{Q}(s,a)$ is computed based on the $Q$-value iteration: \n",
    "    \\begin{equation*}\\hat{Q}(s,a) \\leftarrow \\sum_{s'} P(s'|s,a) \\left( r(s,a) + \\gamma \\max_{a'} Q^{(t)}(s',a') \\right),\\quad\\forall\\:(s,a)\\in\\Omega.\n",
    "    \t\t\t\t\\end{equation*}\n",
    "- The current iteration then ends by reconstructing the full $Q$ matrix with matrix estimation, from the set of observations in $\\Omega$. That is, $Q^{(t+1)}=\\textrm{ME}\\big(\\{\\hat{Q}(s,a)\\}_{(s,a)\\in\\Omega}\\big).$\n",
    "\n",
    "Overall, each iteration reduces the computation cost by roughly $1-p$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "Through SVP, we can compute the final state-action $Q$-value function.\n",
    "To obtain the optimal policy for state $s$, we compute\n",
    "\n",
    "\\begin{align*}\n",
    "    \\pi^{\\star} \\left(s\\right) = \\mbox{argmax}_{a \\in \\mathcal{A}} Q^{\\star}\\left(s, a\\right).\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Generate state-action value function with SVP"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "push!(LOAD_PATH, \".\")\n",
    "using MDPs, DoubleIntegrator\n",
    "mdp = MDP(state_space(), action_space(), transition, reward)\n",
    "__init__()\n",
    "policy = value_iteration(mdp, true, \"../data/qdi_otf_0.2.csv\", true)\n",
    "print(\"\")  # suppress output"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Visualize policy as a heat map"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "viz_policy(mdp, policy, \"SVP policy (20%)\", true, \"di/policy_di_0.2.tex\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true,
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Verify correctness\n",
    "Simulate and visualize trajectory from initial state `[position, speed]`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "ss, as = simulate(mdp, policy, [-0.5, 0.0])\n",
    "viz_trajectory(ss, as, \"SVP trajectory (20%)\", \"SVP input (20%)\", true, \"di/traj_di_0.2.tex\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "average times: 199.800"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "WARNING: Base.@printf is deprecated: it has been moved to the standard library package `Printf`.\n",
      "Add `using Printf` to your imports.\n",
      "  likely near /home/yuzhe/.julia/packages/IJulia/gI2uA/src/kernel.jl:52\n"
     ]
    }
   ],
   "source": [
    "nsim = 1000\n",
    "times = 0\n",
    "for sim = 1:nsim\n",
    "    state = [rand(XMIN:0.001 * (XMAX - XMIN):XMAX), \n",
    "             rand(VMIN:0.001 * (VMAX - VMIN):VMAX)]\n",
    "    _, as = simulate(mdp, policy, copy(state))\n",
    "    times += length(as)\n",
    "end # for sim\n",
    "times /= nsim\n",
    "@printf(\"average times: %.3f\", times)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "kernelspec": {
   "display_name": "Julia 0.7.0",
   "language": "julia",
   "name": "julia-0.7"
  },
  "language_info": {
   "file_extension": ".jl",
   "mimetype": "application/julia",
   "name": "julia",
   "version": "0.7.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
